Don't delete response collectors in a transaction #250

alexjpwalker · 2022-01-26T15:06:55Z

What is the goal of this PR?

We no longer delete response collectors in a transaction after receiving a response to a "single" request, or receiving a "DONE" message in a stream. This fixes a possible error when loading 50+ answers in one query and then performing a second query.

What are the changes implemented in this PR?

We had previously added code to clean up used response collectors in #247. But this broke in the scenario where we open a transaction, run a query that loads 51 answers (the prefetch size + 1), and then run a second query. The server would respond to the first query with: 50 answers -> CONTINUE -> 1 answer [compensating for latency] -> DONE. The client would respond to CONTINUE with STREAM to keep iterating, and the server would respond to STREAM with a 2nd DONE message.

The iterator for query 1 finishes as soon as we see the first DONE message, so we stop reading responses at that point, meaning the second DONE may never be read by the client. But opening the iterator for query 2 causes us to continue reading messages from the transaction stream - note that we have no control over which request is being "currently served"; all responses use the same pipeline, the same gRPC stream. That's why we have the Response Collectors - when we get a response for a request that is different to the request we actually asked for, we need to store it in its respective Collector bucket.

We could mitigate the issue by patching the server, but its current behaviour is actually pretty intuitive - if you send it a STREAM request and it has no more answers, it responds with DONE. We could change it to not respond at all, but that would be adding complexity where it is not really necessary to do so.

So instead, we're reverting back to the old client behaviour, where the response collectors follow the lifetime of the Transaction, noting that Transactions are typically short-lived so cleanup will be performed in a timely manner anyway.

alexjpwalker · 2022-01-26T15:19:19Z

.grabl/automation.yml

@@ -64,20 +64,20 @@ build:
        export ARTIFACT_PASSWORD=$REPO_VATICLE_PASSWORD


Temporarily comment these tests while they are failing due to unreleased Cluster

alexjpwalker · 2022-01-26T15:19:44Z

tests/BUILD

@@ -24,7 +24,6 @@ load("//tools:cluster_test_rule.bzl", "typedb_cluster_py_test")
 load("@vaticle_bazel_distribution//artifact:rules.bzl", "artifact_extractor")


We restructured the test package a little, introducing a specific BUILD file for the integration tests only which did not exist before

alexjpwalker · 2022-01-26T15:19:59Z

tests/integration/BUILD

@@ -0,0 +1,53 @@
+#


This is the new BUILD file that was introduced specifically for integration tests

alexjpwalker · 2022-01-26T15:21:22Z

tests/integration/test_stream.py

@@ -0,0 +1,59 @@
+#


We deleted test_concurrent because it had no purpose other than a debugging playground, and we already have test_debug for that. Now we introduce test_stream, with the goal of one day running it in CI as it contains a useful test case (that failed without this PR, but passes with this PR). However we're not adding it to CI just yet as there is substantial technical debt in the test infrastructure blocking it and we have other priorities.

alexjpwalker · 2022-01-26T15:22:23Z

typedb/common/exception.py

@@ -84,7 +84,7 @@ def __init__(self, code: int, message: str):
 MISSING_DB_NAME = ClientErrorMessage(7, "Database name cannot be empty.")


If UNKNOWN_REQUEST_ID had printed out the response, we could've saved some time debugging this time round. We're now printing out the response to aid debugging in the future.

Of course, if it ends up printing sensitive data, the reporter can mask it when reporting the error.

alexjpwalker · 2022-01-26T15:23:24Z

typedb/stream/bidirectional_stream.py

@@ -63,9 +63,6 @@ def stream(self, req: transaction_proto.Transaction.Req) -> Iterator[transaction
        self._dispatcher.dispatch(req)


Here we remove done from BidirectionalStream. Response Collectors will now be bound to the lifetime of the transaction. We noticed this is also more in line with the terminology we use, where you "close" a Collector precisely when the enclosing transaction is closed.

## What is the goal of this PR? We no longer delete response collectors in a transaction after receiving a response to a "single" request, or receiving a "DONE" message in a stream. This fixes a possible error when loading 50+ answers in one query and then performing a second query. ## What are the changes implemented in this PR? See typedb/typedb-driver-python#250, which this PR is a copy of.

Make response collectors follow transaction lifetime

f5f6e9e

alexjpwalker added type: bug priority: high labels Jan 26, 2022

alexjpwalker added this to the Bug Fixes milestone Jan 26, 2022

alexjpwalker requested a review from flyingsilverfin January 26, 2022 15:06

alexjpwalker self-assigned this Jan 26, 2022

grabl assigned flyingsilverfin Jan 26, 2022

Comment out failing cluster tests 😊

859af46

alexjpwalker commented Jan 26, 2022

View reviewed changes

Fix test-failover-cluster CI job

f5f80a6

flyingsilverfin approved these changes Jan 26, 2022

View reviewed changes

Bump version

b688859

alexjpwalker merged commit 4b983a6 into typedb:master Jan 26, 2022

alexjpwalker deleted the fix-stream-error branch January 26, 2022 16:28

This was referenced Jan 27, 2022

Don't delete response collectors in a transaction typedb/typedb-driver#369

Merged

Don't delete response collectors in a transaction typedb/typedb-driver-nodejs#203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't delete response collectors in a transaction #250

Don't delete response collectors in a transaction #250

alexjpwalker commented Jan 26, 2022

alexjpwalker Jan 26, 2022

alexjpwalker Jan 26, 2022

alexjpwalker Jan 26, 2022

alexjpwalker Jan 26, 2022

alexjpwalker Jan 26, 2022

alexjpwalker Jan 26, 2022

		@@ -64,20 +64,20 @@ build:
		export ARTIFACT_PASSWORD=$REPO_VATICLE_PASSWORD

		@@ -24,7 +24,6 @@ load("//tools:cluster_test_rule.bzl", "typedb_cluster_py_test")
		load("@vaticle_bazel_distribution//artifact:rules.bzl", "artifact_extractor")

		@@ -84,7 +84,7 @@ def __init__(self, code: int, message: str):
		MISSING_DB_NAME = ClientErrorMessage(7, "Database name cannot be empty.")

		@@ -63,9 +63,6 @@ def stream(self, req: transaction_proto.Transaction.Req) -> Iterator[transaction
		self._dispatcher.dispatch(req)

Don't delete response collectors in a transaction #250

Don't delete response collectors in a transaction #250

Conversation

alexjpwalker commented Jan 26, 2022

What is the goal of this PR?

What are the changes implemented in this PR?

alexjpwalker Jan 26, 2022

Choose a reason for hiding this comment

alexjpwalker Jan 26, 2022

Choose a reason for hiding this comment

alexjpwalker Jan 26, 2022

Choose a reason for hiding this comment

alexjpwalker Jan 26, 2022

Choose a reason for hiding this comment

alexjpwalker Jan 26, 2022

Choose a reason for hiding this comment

alexjpwalker Jan 26, 2022

Choose a reason for hiding this comment